Climate Change
If you’ve spent time on social media in the last few years, you may have noticed one particular topic trending often. Now, more than ever, discussions of climate change and its impacts are mainstream. As students in the School of Public Health, our group was united by an interest in climate change and public health. We wanted to quantify the relationship between climate measure variables like particulate matter, ozone levels, and uv radiation, and see how they may impact health outcomes like asthma hospitalizations, lung cancer, and melanoma.
In exploring our outcomes, we asked the following questions in getting to know our data sets:
What are the current incidence rates of lung cancer, melanoma, and asthma emergency room visits? Do these trends differ significantly at the county and state levels?
How have these rates changed by state over time?
In analyzing our exposures and how they impact our outcomes, we asked the following:
How do lung cancer age-adjusted incidence rates vary annually based on median particulate matter levels? Does this change from state to state?
How do emergency department visits and/or hospitalization rates for asthma vary annually based on
median particulate matter levels? Does this change from state to state?
How do melanoma age-adjusted incidence rates vary annually based on uv or ozone levels? Does this change from state to state?
Is annual median particulate matter level a good predictor of lung cancer age-adjusted incidence rates? How does the regression change when state is added to the model?
Is annual median ozone level or uv level a good predictor of melanoma age-adjusted incidence rates? How does the regression change when state is added to the model?
Is annual median particulate matter level a good predictor of annual asthma-related emergency room visits or hospitalizations rates? How does the regression change when state is added to the model?
Exposures
Our ozone data and our air pollution data were both sourced from the CDC’s National Environmental Public Health Tracking website.
Outcomes
For county level cancer outcomes, each state’s data set was sourced from the NIH State Cancer Profiles.
For state level outcomes, each data set was sourced directly from health departments of each respective state.
Asthma
Melanoma
Lung Cancer
A significant amount of cleaning was required, as can be seen in the code chunks below:
exposures data NEEDS CODE CHUNKS
state level outcomes data NEEDS ASTHMA SET UP DATA
#state-level lung cancer PA data https://www.phaim1.health.pa.gov/EDD/
pa_cancer = read.csv('./data/lung/pa_state_lung.csv', sep = ";", header = T, skip = 3) %>%
janitor::clean_names() %>%
select(year, rate_ratio_result) %>%
map_df(str_replace, pattern = ",", replacement = ".") %>%
map_df(as.numeric) %>%
rename(c("age_adjusted_incidence_rate" = "rate_ratio_result")) %>%
add_column(state = "PA")
#state-level lung cancer OH data https://publicapps.odh.ohio.gov/EDW/DataBrowser/Browse/StateLayoutLockdownCancers
oh_cancer = read.csv('./data/lung/oh_state_lung.csv', sep = ";") %>%
janitor::clean_names() %>%
rename(year = cancer_year_year) %>%
select(year, age_adjusted_rate) %>%
map_df(str_replace, pattern = ",", replacement = ".") %>%
map_df(as.numeric) %>%
rename(c("age_adjusted_incidence_rate" = "age_adjusted_rate")) %>%
add_column(state = "OH") %>%
distinct() %>%
filter(complete.cases(.))
#state-level lung cancer NY data https://www.health.ny.gov/statistics/cancer/registry/table2/tb2lungnys.htm
ny_cancer = read.csv("./data/lung/ny_state_lung.csv", skip = 2)[ ,1:3] %>%
janitor::clean_names() %>%
rename(year = x) %>%
select(year, rate_per_100_000_population) %>%
rename(c("age_adjusted_incidence_rate" = "rate_per_100_000_population")) %>%
add_column(state = "NY")
# Read in the data and set the column names
# Add a state column and make all rows = "NY"
# Make the year column of type integer
# Add a cancer type column and make all rows = "melanoma"
# Select the state, year, cancer_type and mf_case_rate columns
# Rename mf_case_rate column "age_adjusted_incidence_rate"
nys_melanoma_age_adjusted <- read_csv(here("data",
"incidence_melanoma_data",
"nys_melanoma_incidence_mortality_year.csv"),
skip = 3,
col_names = c("year",
"mf_cases",
"mf_case_rate",
"mf_case_rate_ci",
"m_cases",
"m_case_rate",
"m_case_rate_ci",
"f_cases",
"f_case_rate",
"f_case_rate_ci",
"mf_deaths",
"mf_death_rate",
"mf_death_rate_ci",
"m_deaths",
"m_death_rate",
"m_death_rate_ci",
"f_deaths",
"f_death_rate",
"f_death_rate_ci"
)) %>%
mutate(state = "NY",
year = as.integer(year),
cancer_type = "melanoma") %>%
select(state, year, cancer_type, mf_case_rate) %>%
rename(age_adjusted_incidence_rate = mf_case_rate)
# Read in data set
ohio_cancer_incidence <- readxl::read_xls(here("data",
"incidence_melanoma_data",
"ohio_cancer_incidence_year.xls"),
skip = 1)
###############################################################################
# Melanoma cases
# Get cases for each year
ohio_cancer_cases <- ohio_cancer_incidence %>%
select("Site/Type", starts_with("Case"))
# Update column names for the Ohio cancer cases data frame
colnames(ohio_cancer_cases) <- c("type", 1996:2018, "total")
# Get melanoma of the skin cancer cases for Ohio
# Pivot longer year
# Select year and cases variables
ohio_melanoma_cases <- ohio_cancer_cases %>%
filter(type == "Melanoma of Skin") %>%
pivot_longer("1996":total,
names_to = "year",
values_to = "cases") %>%
select(year, cases)
###############################################################################
# Melanoma age-adjusted
# Get age-adjusted for each year
ohio_cancer_age_adjusted <- ohio_cancer_incidence %>%
select("Site/Type", starts_with("Age"))
# Update column names for Ohio cancer age-adjusted data frame
colnames(ohio_cancer_age_adjusted) <- c("type", 1996:2018, "total")
# Get melanoma of the skin cancer age-adjusted for Ohio
# Pivot longer year
# Select year and age_adjusted variables
# Remove total row
# Add a state column and set all rows = "OH"
# Convert type of year column to be integer
# Add a cancer type column and set all rows = "melanoma"
# Convert type of age_adjusted_incidence_rate column to be double
# Reorder columns: state, year, cancer_type, age_adjusted_incidence_rate
ohio_melanoma_age_adjusted <- ohio_cancer_age_adjusted %>%
filter(type == "Melanoma of Skin") %>%
pivot_longer("1996":total,
names_to = "year",
values_to = "age_adjusted_incidence_rate") %>%
select(year, age_adjusted_incidence_rate) %>%
filter(year != "total") %>%
mutate(state = "OH",
year = as.integer(year),
cancer_type = "melanoma",
age_adjusted_incidence_rate = as.double(age_adjusted_incidence_rate)) %>%
select(state, year, cancer_type, age_adjusted_incidence_rate)
# Final data frames: ohio_melanoma_cases and ohio_melanoma_age_adjusted
# Read in data, skip first three lines
# Clean variable names
# Select relevant variables
# Add a state column and set all rows = "PA"
# Convert type of year column to be integer
# Add a cancer_type column and set all rows = "melanoma"
# Rename rate_ratio_result column to "age_adjusted_incidence_rate"
# Reorder columns: state, year, cancer_type, age_adjusted_incidence_rate
# Sort data frame by ascending year
pa_melanoma_age_adjusted <- read_csv(here("data",
"incidence_melanoma_data",
"pa_melanoma_incidence_year.csv"),
skip = 3) %>%
janitor::clean_names() %>%
select(year, rate_ratio_result) %>%
mutate(state = "PA",
year = as.integer(year),
cancer_type = "melanoma") %>%
rename(age_adjusted_incidence_rate = rate_ratio_result) %>%
select(state, year, cancer_type, age_adjusted_incidence_rate) %>%
arrange(year)
county level outcomes data NEEDS ASTHMA SET UP DATA
read_and_clean_county_data <- function(file_path_name, state, outcome) {
read_csv(file_path_name,
skip = 8) %>%
janitor::clean_names() %>%
select(county, age_adjusted_incidence_rate_rate_note_cases_per_100_000) %>%
mutate(county = stringr::str_remove(county, "\\(.\\)$"),
state = state,
outcome = outcome,
age_adjusted_incidence_rate =
as.numeric(age_adjusted_incidence_rate_rate_note_cases_per_100_000)) %>%
select(state, county, outcome, age_adjusted_incidence_rate)
}
#county-level lung cancer PA data https://statecancerprofiles.cancer.gov/
pa_county_lc = read_and_clean_county_data('./data/lung/pa_lung_data_county.csv',
state = "PA",
outcome = "lung cancer") %>%
filter(complete.cases(.)) %>%
slice(-c(1, 2))
#county-level lung cancer OH data https://statecancerprofiles.cancer.gov/
oh_county_lc = read_and_clean_county_data('./data/lung/oh_lung_data_county.csv',
state = "OH",
outcome = "lung cancer") %>%
filter(complete.cases(.)) %>%
slice(-c(1, 2))
#county-level lung cancer NY data https://statecancerprofiles.cancer.gov/
ny_county_lc = read_and_clean_county_data('./data/lung/ny_lung_data_county.csv',
state = "NY",
outcome = "lung cancer") %>%
filter(complete.cases(.)) %>%
slice(-c(1, 2))
# Read in New York melanoma incidence county-level data averaged from 2014 to
# 2018
nys_county_melanoma_incidence_2014_2018 <-
read_and_clean_county_data(here("data",
"incidence_melanoma_data",
"nys_county_melanoma_incidence_2014_2018.csv"),
"NY",
"melanoma")
# Read in Ohio melanoma incidence county-level data averaged from 2014 to 2018
ohio_county_melanoma_incidence_2014_2018 <-
read_and_clean_county_data(here("data",
"incidence_melanoma_data",
"ohio_county_melanoma_incidence_2014_2018.csv"),
"OH",
"melanoma")
# Read in Pennsylvania melanoma incidence county-level data averaged from
# 2014 to 2018
pa_county_melanoma_incidence_2014_2018 <-
read_and_clean_county_data(here("data",
"incidence_melanoma_data",
"pa_county_melanoma_incidence_2014_2018.csv"),
"PA",
"melanoma")
Then, we were able to merge both of our exposures into one data set, apuv, with the following variables:
NEEDS CODE CHUNKS
Year. Year
Countyfips. County-level unique identifier
County. Name of county
State. 2-character abbreviation of state name
Season. Period of year in which data was collected
Pm25_max_pred.
Pm25_med_pred.
Pm25_mean_pred.
Pm25_pop_pred.
O3_max_pred.
O3_med_pred.
O3_mean_pred.
O3_pop_pred.
Edd.
Edr.
I305.
I310.
I324.
I380.
Next, we were able to merge all of our state level outcomes into one data set, lc_mel_asthma_state:
# merging state level data for lung cancer
state_lc <- rbind(pa_cancer, oh_cancer, ny_cancer)
state_lc_wide = state_lc %>%
pivot_wider(
names_from = state,
values_from = age_adjusted_incidence_rate) %>%
rename(c("PA_AAIR" = "PA", "NY_AAIR" = "NY", "OH_AAIR" = "OH"))
write_csv(state_lc_wide, "./data/state_lc_wide.csv")
# Merge state-level data frames with bind_rows
state_melanoma_age_adjusted <-
bind_rows(nys_melanoma_age_adjusted,
ohio_melanoma_age_adjusted,
pa_melanoma_age_adjusted)
# read in longitudinal state data for lung and skin cancer
# longitudinal state level lung cancer data
state_lc_data = read_csv("./data/lung/state_lc_data.csv")
# longitudinal state level melanoma data
state_mel_data = read_csv("./data/incidence_melanoma_data/state_melanoma_age_adjusted.csv") %>%
rename(c("outcome" = "cancer_type"))
# combine cancer outcome data
lc_mel_state = bind_rows(state_lc_data, state_mel_data) %>%
select(state, year, outcome, age_adjusted_incidence_rate)
# Read in longitudinal state-level asthma data
# Rename and select columns
asthma_state <- read_csv(here::here("data", "asthma_state.csv")) %>%
rename(age_adjusted_incidence_rate = rate) %>%
select(state, year, outcome, age_adjusted_incidence_rate)
# Bind rows of longitudinal asthma and longitudinal cancer by state and arrange
# accordingly
lc_mel_asthma_state <- bind_rows(lc_mel_state, asthma_state) %>%
arrange(state, year, outcome)
Then, we could do the same with our county-level outcomes to get our lc_mel_asthma_county data set
fips_codes = read.csv('./data/fips_codes.csv') %>%
janitor::clean_names() %>%
filter(state %in% c("NY", "PA", "OH")) %>%
rename(c("county" = "name"))
fips_codes$county[fips_codes$county == "St Lawrence"] <- "St. Lawrence"
county_lc <- rbind(pa_county_lc, oh_county_lc, ny_county_lc) %>%
mutate(county = (gsub('.{7}$', '', county)))
lc_fips <- merge(county_lc, fips_codes, by = c("state", "county"))
# Merge county-level data frames with bind_rows
state_county_melanoma_incidence_2014_2018 <-
bind_rows(nys_county_melanoma_incidence_2014_2018,
ohio_county_melanoma_incidence_2014_2018,
pa_county_melanoma_incidence_2014_2018)
county_lc = read_csv("./data/lung/county_lc.csv")
county_mel = read_csv("./data/incidence_melanoma_data/state_county_melanoma_incidence_2014_2018.csv")
# cleaning to make the data sets compatible
county_mel = county_mel %>%
rename(c("outcome" = "cancer_type")) %>%
drop_na()
county_mel <- county_mel[!(county_mel$county == "Ohio" | county_mel$county == "Pennsylvania" |
county_mel$county == "New York"), ] %>%
filter(!grepl('SEER', county))
county_mel$county <- gsub(" County","", county_mel$county)
county_mel$county[county_mel$county == "St Lawrence"] <- "St. Lawrence"
county_lc$county <- gsub(" County","", county_lc$county)
# binding rows
lc_mel_county = bind_rows(county_lc, county_mel) %>%
filter(complete.cases(.))
# Add FIPS column to the data
fips = fips_codes %>%
janitor::clean_names()
# fips df has 'St Lawrence' instead of 'St. Lawrence'
fips$county[fips$county == "St Lawrence"] <- "St. Lawrence"
lc_mel_county = merge(lc_mel_county, fips, by = c("state", "county"))
# Read in cross-sectional asthma data
# Only get county data for the year 2016 since we need county data at
# a fixed time point (i.e. not longitudinal)
# Adjust column names and types in prep for merge
asthma_county <- read_csv(here::here("data", "asthma_county.csv")) %>%
filter(year == 2016) %>%
select(-year) %>%
rename(age_adjusted_incidence_rate = rate) %>%
mutate(age_adjusted_incidence_rate = as.double(age_adjusted_incidence_rate))
# asthma_county has 'New York County' instead of 'New York' in fips df, and "GAllia" instead of "Gallia"
asthma_county$county[asthma_county$county == "New York County"] <- "New York"
asthma_county$county[asthma_county$county == "GAllia"] <- "Gallia"
asthma_county <- merge(asthma_county, fips, by = c("state", "county"))
# Bind rows of cancer and asthma by county and arrange accordingly
lc_mel_asthma_county <- bind_rows(lc_mel_county, asthma_county) %>%
arrange(state, county, outcome)
lc_mel_asthma_county
state. 2-letter abbreviation
county. name of county
outcome. type of outcome (lung cancer, melanoma, asthma ED visit)
age_adjusted_incidence_rate. per 100,000
fips. county-level unique identifier
lc_mel_asthma_state
state. 2-letter abbreviation
year. Year
outcome. type of outcome (lung cancer, melanoma, asthma ED visit)
age_adjusted_incidence_rate. per 100,000
Visualizations, summaries, and exploratory statistical analyses. Justify the steps you took, and show any major changes to your ideas.
Initial assessment of asthma data explored the two metric for measuring prevalence in the population, namely emergency department visits and hospitalizations due to asthma. STILL WORKING ON THIS
To better understand the potential association between the climate conditions and disease risks, we may first look at the conditions at the county level for these 3 states from 2005 to 2015.
## Read in tidied air pollution and radiation data. See the Climate page for how we organized it.
apuv_df = readRDS("./ap/apuv_map/apuv.RDS")
## Wrangle and plot a year versus exposure time series line groph. We used the average seasonal data for each of the years.
apuv_plot =
apuv_df %>%
filter(state!="ME") %>%
unite(season_year, c("season", "year"), sep = "_") %>%
mutate(
season_year = factor(season_year),
season_year = fct_inorder(season_year),
county = factor(county)
) %>%
arrange(state, county) %>%
unite(county_state, c("county", "state"), sep = ", ") %>%
mutate(
county_state = factor(county_state),
county_state = fct_inorder(county_state)
) %>%
select(season_year, county_state, pm25_pop_pred, o3_pop_pred, edd) %>%
pivot_longer(
pm25_pop_pred:edd,
names_to = "climate",
values_to = "level"
) %>%
mutate(
climate = factor(climate, levels = c("pm25_pop_pred", "o3_pop_pred", "edd"),
labels = c("PM2.5", "O3", "UV"))
) %>%
ggplot(aes(x=season_year, y=level, group=1, color=county_state)) +
geom_line() +
scale_x_discrete(breaks=c("Spring_2005", "Winter_2015"),
labels=c("2005", "2015")) +
labs(
title = "Air Pollutant Concentrations and UV Intensities over the Years",
color = "County",
y = "Concentration/Intensity"
) +
theme(
axis.title.x=element_blank()
) +
facet_grid(climate~., scales = "free_y")
ggplotly(apuv_plot)
We next do a violin plot for the health outcomes at the county level, using asthma incidence rate from 2016, and lung cancer and melanoma incidence rates from 2014-2018.
## Read in tidied county level health outcome data. See the Health page for how we organized it.
library(beeswarm)
dis_df = readRDS("./ap/apuv_map/dis_rates.RDS")
beeswarm_dis_df =
dis_df %>%
arrange(state, county) %>%
unite(county_state, c("county", "state"), sep = ", ") %>%
mutate(
county_state = factor(county_state),
county_state = fct_inorder(county_state)
) %>%
select(-fips)
beeswarm_dis_plot =
beeswarm(age_adjusted_incidence_rate ~ outcome,
data = beeswarm_dis_df,
method = 'swarm',
pwcol = county_state,
corral = "wrap",
)
dis_plot =
ggplot(beeswarm_dis_plot, aes(x, y)) +
theme_minimal() +
geom_violin(aes(x, y, group = x.orig), color = "grey", fill = "grey") +
geom_point(aes(color = col)) +
scale_x_discrete(labels=c("Asthma", "Lung Cancer", "Melanoma")) +
labs(
title = "Health Outcome Rates in the Following Years",
color = "County",
y = "Rate (Count per 100K)"
) +
theme(
axis.title.x=element_blank()
)
ggplotly(dis_plot)
Let’s play around these two plots and see we may find counties with standout rates in health outcomes have abnormal climate conditions.
Once all the data was gathered for both exposures and outcomes of interest, we sought to visualize the data in a cross-sectional manner with incidence rates of outcomes plotted onto maps indicating the extent of exposures for a given year. In this case, we used data from 2016 (with the exception of UV dat which was not available and the 2015 data was used in its place). This was accomplished by first plotting maps with a gradient fill to indicate relative levels of the exposure (i.e UV radiation, particulate matter, or low-atomspheric ozone) at the county level. Bubble markers were then overlain with each bubble’s size and shading indicating the relative incidence of outcomes (i.e. hospitalization due to asthma, melanoma incidence, and lung cancer incidence). These layers were then put into plot_ly via ggplotly for interactivity with each marker giving all available data for a given county.
The follow code chunk prepares the data frames for both the map and the data.
# Load relavant libraries and setup
library(viridis)
library(mapproj)
library(maps)
library(housingData)
options(
ggplot2.continuous.colour = "viridis",
ggplot2.continuous.fill = "viridis",
knitr.kable.NA = ""
)
scale_colour_discrete = scale_color_viridis_d
scale_fill_discrete = scale_fill_viridis_d
# Prepare map data
state_df <-
map_data("state") %>%
filter(region == "ohio"|
region == "new york" |
region == "pennsylvania" )
county_df <-
map_data("county") %>%
filter(region == "new york" |
region == "pennsylvania" |
region == "ohio")
# obtain list of unique counties with a lat-long measurement associated to it
counties_unique <-
county_df %>%
filter(region == "ohio"|
region == "new york" |
region == "pennsylvania" ) %>%
as.tibble() %>%
distinct(subregion, region, .keep_all = TRUE) %>%
select(subregion, region, everything()) %>%
unite("subreg_reg", subregion:region, remove = FALSE)
# Prepare exposure data
o3 <- read_csv("./ap/ap_uv/o3.csv") %>%
janitor::clean_names() %>%
filter(state == "NY" | state == "OH" | state == "PA",
season == "Summer",
year == 2016) %>%
select(-hover, -season) %>%
mutate(county = tolower(county))
pm25 <- read_csv("./ap/ap_uv/pm25.csv") %>%
janitor::clean_names() %>%
filter(state == "NY" | state == "OH" | state == "PA",
season == "Summer",
year == 2016) %>%
select(-hover, -season) %>%
mutate(county = tolower(county))
# There is no UV data for 2016 so values from 2015 will be used
uv <- read_csv("./ap/ap_uv/edd.csv") %>%
janitor::clean_names() %>%
filter(state == "NY" | state == "OH" | state == "PA",
season == "Summer",
year == 2015) %>%
select(countyfips, uv)
# joing the dataframes for all exposures
exposures <-
o3 %>%
select(countyfips, o3) %>%
left_join(pm25, by = "countyfips") %>%
left_join(uv, by = "countyfips") %>%
rename(region = state,
subregion = county) %>%
mutate(region = str_replace(region, "NY", "new york"),
region = str_replace(region, "PA", "pennsylvania"),
region = str_replace(region, "OH", "ohio")) %>%
select(-year)
# preparing outcomes dataframe (all taken from county level data availabl for 2016)
outcomes <-
read_csv("./data/lc_mel_asthma_county.csv") %>%
rename(rate = age_adjusted_incidence_rate) %>%
pivot_wider(names_from = outcome, values_from = rate) %>%
janitor::clean_names() %>%
rename(region = state,
subregion = county) %>%
mutate(subregion = tolower(subregion),
region = str_replace(region, "NY", "new york"),
region = str_replace(region, "PA", "pennsylvania"),
region = str_replace(region, "OH", "ohio"))
# combine outcomes and expousre into one datafram based on counties
county_df <-
exposures %>%
left_join(outcomes, by = c("subregion", "region")) %>%
select(-fips) %>%
mutate(countyfips = as.character(countyfips)) %>%
select(subregion, region, everything()) %>%
unite("subreg_reg", subregion:region, remove = TRUE)
# combine county datafram with exposure and outcome data to unqiue lat-long (will be used for data points)
centers <-
geoCounty %>%
filter(state == "NY" | state == "PA" | state == "OH") %>%
select(-county, -state) %>%
rename(countyfips = fips,
region = rMapState,
subregion = rMapCounty) %>%
as.tibble
text_df <-
county_df %>%
left_join(centers, by = "countyfips") %>%
janitor::clean_names() %>%
mutate(mytext = paste(
"Location: ", subregion, ", ",region, "\n",
"Melanoma Incidence: ", melanoma, "\n",
"Lung Cancer Incidence: ", lung_cancer, "\n",
"Asthma Hospitalization Incidence: ", asthma, "\n",
"Summer UV (2015): ", uv, "\n",
"Summer PM2.5: ", pm2_5, "\n",
"Summer ozone: ", o3,sep=""))
# prepare combined dataframe for combination to map database with borders for counties (will be used for plotting county-polygons and associated gradient fill)
map_df <-
text_df %>%
select(-lat,-lon, -mytext)
county_map <- map_data("county") %>%
filter(region == "ohio"|
region == "new york" |
region == "pennsylvania" ) %>%
select(subregion, region, everything()) %>%
unite("subreg_reg", subregion:region, remove = TRUE) %>%
left_join(map_df, by = "subreg_reg")
This next code chunk sets up each plot.
# Start plotting
# PM2.5 on the map and datapoints for lung cancer ####
p_pm_lc <-
ggplot(county_map) +
geom_polygon(aes(x = long, y = lat, group = group,
fill = pm2_5),
alpha = 0.6,
color = "black") +
scale_fill_gradient(low = "white", high = "black", name = "PM2.5 (µg/m^3)") +
geom_point(data = text_df,
aes(x = lon, y = lat,
size = lung_cancer,
color = lung_cancer,
text = mytext),
alpha = 0.6
) +
scale_color_viridis(name = "Lung Cancer Incidence\n(per 100 000)") +
scale_alpha_continuous() +
theme_minimal() + coord_map() +
scale_x_continuous("") +
scale_y_continuous("")
p_pm_lc <- ggplotly(p_pm_lc, tooltip = "mytext")
# PM2.5 on the map and datapoints for asthma ####
p_pm_as <-
ggplot(county_map) +
geom_polygon(aes(x = long, y = lat, group = group,
fill = pm2_5),
alpha = 0.6,
color = "black") +
scale_fill_gradient(low = "white", high = "black", name = "PM2.5 (ug/m^3)") +
geom_point(data = text_df,
aes(x = lon, y = lat,
size = asthma,
color = asthma,
text = mytext),
alpha = 0.6
) +
scale_color_viridis(name = "Asthma Hospitalization\nIncidence\n(per 100 000)") +
scale_alpha_continuous() +
theme_minimal() + coord_map() +
scale_x_continuous("") +
scale_y_continuous("")
p_pm_as <- ggplotly(p_pm_as, tooltip = "mytext")
# O3 on the map and datapoints for lung cancer ####
p_o3_lc <-
ggplot(county_map) +
geom_polygon(aes(x = long, y = lat, group = group,
fill = o3),
alpha = 0.6,
color = "black") +
scale_fill_gradient(low = "white", high = "black", name = "Ozone (ppb)") +
geom_point(data = text_df,
aes(x = lon, y = lat,
size = lung_cancer,
color = lung_cancer,
text = mytext),
alpha = 0.7
) +
scale_color_viridis(name = "Lung Cancer Incidence\n(per 100 000)") +
scale_alpha_continuous() +
theme_minimal() + coord_map() +
scale_x_continuous("") +
scale_y_continuous("")
p_o3_lc <- ggplotly(p_o3_lc, tooltip = "mytext")
# O3 on the map and datapoints for asthma ####
p_o3_as <-
ggplot(county_map) +
geom_polygon(aes(x = long, y = lat, group = group,
fill = o3),
alpha = 0.6,
color = "black") +
scale_fill_gradient(low = "white", high = "black", name = "Ozone (ppb)" ) +
geom_point(data = text_df,
aes(x = lon, y = lat,
size = asthma,
color = asthma,
text = mytext),
alpha = 0.7
) +
scale_color_viridis(name = "Asthma Hospitalization\nIncidence\n(per 100 000)") +
scale_alpha_continuous() +
theme_minimal() + coord_map() +
scale_x_continuous("") +
scale_y_continuous("")
p_o3_as <- ggplotly(p_o3_as, tooltip = "mytext")
# UV on the map and datapoints for melanoma ####
p_uv_mel <-
ggplot(county_map) +
geom_polygon(aes(x = long, y = lat, group = group,
fill = uv),
alpha = 0.6,
color = "black") +
scale_fill_gradient(low = "white", high = "black",
name = "UV (J/m^2)\n(erthemally weighted daily dose,\nSummer 2015)" ) +
geom_point(data = text_df,
aes(x = lon, y = lat,
size = melanoma,
color = melanoma,
text = mytext),
alpha = 0.7
) +
scale_color_viridis(name = "Melanoma Incidence\n(per 100 000, 2016)") +
scale_alpha_continuous() +
theme_minimal() + coord_map() +
scale_x_continuous("") +
scale_y_continuous("")
p_uv_mel <- ggplotly(p_uv_mel, tooltip = "mytext")
The plots are as follows:
Summer 2016 mean atmospheric ozone concentration (in ppb) plotted on the map with markers for lung cancer incidence rates in 2016 (age-adjusted per 100,000 people)
p_o3_lc
Summer 2016 mean atmospheric ozone concentration (in ppb) plotted on the map with markers for asthma hospitalization rates in 2016 (age-adjusted per 100,000 people)
p_o3_as
Summer 2016 mean PM 2.5 (in µg/m3) plotted on the map with markers for lung cancer incidence rates in 2016 (age-adjusted per 100,000 people)
p_pm_lc
Summer 2016 mean PM 2.5 (in µg/m3) plotted on the map with markers for asthma hospitalization rates in 2016 (age-adjusted per 100,000 people)
p_pm_as
Summer 2015 mean UV (in J/m2, erthemally weighted daily dose) plotted on the map with markers for melanoma incidence rates in 2016 (age-adjusted per 100,000 people)
p_uv_mel
There are no striking trends in the data but it is noted that some of the highest hospitalization rates due to asthma are located in the NYC metropolitan area where PM2.5 and ozone concentrations were relatively high during the summer of 2016. However, lung cancer rates were grestest in Southern Ohio and Northern New York where there are low to moderate levels of ozone and PM2.5. Melanoma incidence rates for 2016 were overall higher in Ohio where there were also moderate to high levels of UV radiation the year prior in 2015.
These initial cross-sectional visualizations hint at possible correlations but further analyses are warranted to see the extent of associations between our exposures and outcomes of interest if they exist.
If you undertake formal statistical analyses, describe these in detail
What were your findings? Are they what you expect? What insights into the data can you make?